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We prove an uncertainty relation, which imposes a bound on any joint measurement of 
position and momentum. It is of the form (AP)(AQ) > Cft, where the 'uncertainties' 
quantify the difference between the marginals of the joint measurement and the corre- 
sponding ideal observable. Applied to an approximate position measurement followed 
by a momentum measurement, the uncertainties become the precision AQ of the po- 
sition measurement, and the perturbation AP of the conjugate variable introduced by 
such a measurement. We also determine the best constant C, which is attained for a 
unique phase space covariant measurement. 



1 Introduction 

Heisenberg's Uncertainty Relation (AQ)(AP) > ft/2 is one of the most fundamental 
features of quantum theory, and is taught in even the most basic course on the subject. 
All too often, however, teachers succumb to the persistent bad habit of proving the 
relations as an inequality on variances for arbitrary state preparations, but then to go 
on to explain their 'physical meaning' in terms of a perturbation of the momentum of 
a particle caused by an approximate position measurement. Since the usual proof con- 
tains nothing of that sort, attentive students quickly get the impression that quantum 
uncertainty rubs off on their teachers as some kind of conceptual fuzziness. Our aim 
in this paper is to state the measurement aspect of uncertainty as rigorously as has 
become standard for the preparation aspect and, of course, to prove the corresponding 
inequality. 

Both aspects of uncertainty go back all the way to Heisenberg's paper^Q in which 
the relations were first introduced, and it is perhaps instructive to disentangle the 
richness of Heisenberg's paper a little bit. He begins his discussion with the famous 
example of a position measurement on an electron by observation under a 7-ray mi- 
croscope: the resolution AQ of such a device is of the order of the wavelength A of 
the photons. However, the interaction gives the electron a Compton kick, transfer- 
ring an uncontrolled momentum of the order of the momentum of the photon, i.e., 
AP ~ 2nh/(AQ). Heisenberg paraphrases this by saying that precisely at the mo- 
ment of interaction, i.e., at the moment the electron's position "becomes known", the 
momentum "becomes unknown" in accordance with the Uncertainty Relation. He 
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observes that this is related to the commutation relations, and announces that this 
"direct mathematical connection" will be demonstrated later in the paper. Disappoint- 
ingly, this demonstration (on p. 180) turns out to be an order-of-magnitude discussion 
of the spread of Gaussian wave packets. 

After Heisenberg the stringency of this demonstration was improved considerably, 
beginning with Kennard[21, and a version for general non-commuting quantities by 
Robertson[3j. These mathematical formulations fix the meaning of AP and AQ as 
the square root of variances, and replace Heisenberg's own notation "~" for "of the 
order of magnitude of ... " by a rigorous inequality, in which even the constant h/2 
is precisely optimal. Since the proof involves just an elementary application of the 
Schwarz inequality, it has become standard textbook material, and Heisenberg him- 
self seems to have adopted it as the principal formulation of uncertainty in his later 
writings. The meaning of the Uncertainty Relations in this formulation is again sum- 
marized in Fig. Q Obviously, they refer to two separate experiments, in the sense 
that to each single quantum particle either a position or a momentum measurement 
is applied. The preparation is the same in both cases, so the relations are best seen 
as a constraint on the possibility of preparing states with low variances. 
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Figure 1: The Preparation Uncertainty Relation refers to the variances in two 
separate ideal measurements on the same state. 

But what became of the microscope? Clearly, Heisenberg discusses a simple mea- 
surement process, in which the initial preparation of the electrons plays no important 
role. Position and momentum are both measured for the same particle (even if imper- 
fectly). The key observation is that the measurement of position necessarily disturbs 
the particle, so that the momentum is changed by the measurement. Indeed, it is a 
fundamental theorem of quantum theory that there is no measurement without per- 
turbation. More precisely, if the output quantum states of a measuring device coincide 
with the input states for all inputs, then the measured values are statistically inde- 
pendent of the input, i.e., no information is gained from the 'measurement'. But 
this statement captures none of the quantitative content of Heisenberg's discussion. 
Figure 2 shows how we might understand the uncertainties for the microscope: The 
quantum system is first subject to an approximate position measurement Q' . This is 
not an ideal measurement Q, since the 7-rays have non-zero wavelength. So AQ is 
some measure of the difference between Q and Q' . The next step is a measurement 
of momentum. Due to the previous perturbation of the system, we cannot hope to 
recover precisely the momentum of the initial particle. So if P' is the momentum mea- 
surement (including the prior perturbation) we will see a difference AP to an ideal 
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Figure 2: TTie Measurement Uncertainty Relations studied in this paper refers 
to the deviations of the marginals of a joint measurement M from the ideal 
position and momentum observables. The joint measurement can be realized by 
first a position measurement Q' and then a momentum measurement P'. 



momentum measurement P. The claim of the Measurement Uncertainty Relations is 
that AQAP > C'h for some constant C. The aim of this paper is to do for this rela- 
tion what Kennard did for the Preparation Uncertainty Relation: to give a rigorous 
definition of the quantities involved and to prove the inequality as a Theorem. 

The reason why this was not done 70 years ago might be that AQ is a difference 
between observables like Q and Q', which are never measured in the same experiment. 
Therefore a quantity like the expectation of (q — q') 2 makes no sense at all. So we have 
to define AQ as a distance between the probability distributions of q and q' , which 
requires some conceptual work (see Section 

It is clear that we can make devices M, which for a particular input state p produce 
outputs with precisely the same distributions as the ideal measurements. Indeed, we 
can simply make M a random generator for an arbitrary pair of distributions. Such 
a device M would utterly fail to reproduce the distributions for other input states, of 
course. Therefore we will define AQ (and similarly AP) as the worst case distances 
between the probability distributions of Q and Q'. 

In previous work on measurement uncertainty only covariant joint measurements 
were considered, i.e., measurements with the expected transformation behavior with 
respect to phase space translations. In this case, which will also play a crucial role 
in the present paper, the conceptual problem of interpreting the As is much easier: 
in that case the marginals of a joint measurement can be simulated by adding to the 
results of an ideal measurement some noise, which is independent of the input. Hence 
any parameter quantifying the size of the noise (e.g., the variance) will do. Discussions 
of uncertainty in this setting can be found in many places, not least in the work of 
Holevo|5]. The new contribution in this article is a definition of the As, which makes 
sense without covariance, and the correspondingly extended inequality. 

This paper is organized as follows. We will describe the precise definitions of AP 
in Section [5] and also state our Theorem. In Section [3] we describe how to compute 
AP and AQ for the special case of measurements, which are covariant with respect 
to phase space translations, and show how to obtain the best constant C in this 
restricted class. Finally, in Section 2] we show that general measurements M never 
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outperform the covariant ones, i.e., the bounds previously established also hold for 
joint measurements without assuming any covariance condition. Some related ideas 
and versions of uncertainty will be discussed in the last Section |5] of the paper. 

2 Distance of observables on a metric space 
2.1 Monge distance of probability measures 

Let us fix some notation. If X is some measurable space (i.e., a space equipped 
with a (T-algebra of 'measurable sets') a probability measure /j, on X assigns to each 
measurable set a probability in a countably additive way. Equivalently, we can consider 
the expectation value functional induced by jj,, i.e., fl(f) = J fi{dx) f(x), where / is 
any bounded measurable function / : X — ► R. This functional, which we will denote 
by the same letter [i is also called the Radon measure associated with p. Whether a 
measure is primarily seen as a function on sets or as a linear functional is largely a 
matter of taste. The Radon measure point of view will have advantages in Section |1J 
where we need to discuss measures with non-zero weight at infinity. Hence we will 
use it throughout the paper. By 5 X we denote the point measure at x S X, i.e., 
**(/) = /(*) for all/. 

A natural way of describing the difference between two probability measures /Hi 
and /i2 is to take the largest difference in probabilities they can assign to any event, 
i.e., (up to a conventional factor 2): 

UMi — = 2 sup \hi{<t) — (12 (cr) | 

= sup M/) -/«(/)] , (1) 

I/I<1 

where the first supremum is over all measurable sets, and the second is over all measur- 
able functions with |/(a;)| < 1 for all x. This quantity is known as the norm difference 
with respect to the norm of "total variation". 

However, this distance between probability measures is totally useless for defining 
a quantity like AQ. As a measurable space X = R and X = R 2 are isomorphic, so 
this structure knows nothing of the topology of X, and of the closeness of points in x. 
For example take two point measures 8 Xl and 5 X2 for distinct points x%,X2 € X. Since 
there is a measurable set containing X\ but not xi, we always have \\8 Xl — S X2 1^ —2, 
even if the points are "very close" and so the two point measures describe practically 
indistinguishable probability distributions. 

In order to set up a quantitative notion of the distance of probability measures, 
according to which nearby point measures would be close, too, we must have a no- 
tion of closeness for points to begin with. Therefore, we fix a metric d on X. The 
only technical requirement linking the metric and the measurable structure is that all 
continuous functions for the metric are measurable. The idea is then to define the 
distance between probability measures as the largest difference of expectation values 
on "slowly varying functions" . 

Definition 1 Let X be a metric space with metric d. We define the Lipshitz ball A 

of (X, d) as the set of bounded functions f such that 

\f(x)-f(y)\<d(x,y), (2) 
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for all x,y S X. Then, for any two probability measures /ii,/i2 on X we define the 
distance as 

d(^x,Ata) =sup|/*i(/)-/i 2 (/)| . (3) 

/SA 

Strictly speaking, it is another abuse of notation to use the same letter for the metrics 
on points and on measures. However, the two are very closely related. For example, 
if we take two point measures, we find d(S x ,8 y ) = d(x,y), where the inequality "<" 
follows by definition of the Lipshitz ball, and the reverse inequality follows by observing 
that the function f(z) — d(x, z) (with a suitable cutoff to make it bounded) is in A by 
the triangle inequality. 

There is an alternative "dual" definition of this distance, going back to a problem 
by G. Monge|H] in 1781. Consider, instead of two probability distributions, two heaps 
of soil of equal volume and the task of transforming one heap into the other by moving 
around small amounts of soil. Suppose that for each such move we have to pay a price 
proportional to the amount and to the distance. Then the lowest possible price for 
the transformation is called the Monge distance between pi and ^2- To make this 
definition explicit, suppose we note for each bit of soil the initial and final location. 
This will result in a probability measure /ii2 on X x X, whose marginals are the given 
measures fii and respectively. The price payed will be proportional to 



D(fi 12 ) = J fii2{dx,dy) d(x,y) . (4) 

Clearly, for any / £ A, we will have 

A«i(/) - Mf) = J »i2{dxdy) {f{x) - f(y)) < D( Ml2 ) . (5) 

Then the supremum of the left hand side is the distance defined in Definition 
whereas the infimum of the right hand side is Monge distance. Due to a 1942 paper 
of L. Kantorovich [7j the two are, in fact, the same. This result is very much in the 
spirit of modern duality theory of convex optimization problems. Duality also helps to 
understand the structure of maximizing functions / and minimizing joint distributions 
[112, which tend to be supported on the graph of a function, provided the measures 
/ii and /i2 are not too lumpy. Uniqueness can be enhanced|Sj by replacing d(x,y) in 
the objective functional by d(x, y) 1+e , thereby putting an extra penalty on scattering 
mass, and then letting e — » 0. 

Position and momentum take their values in a vector space, so we will briefly 
note some special properties of the Monge metric in this case. We assume that the 
metric d is consistent with the linear structure, namely translationally invariant, and 
homogeneous with respect to scaling. In other words, we require d(x,y) = \x — y\ for 
some vector space norm |-|. The scaling property is important so that we can assign to 
distances the same physical units as to the coordinates. Another key operation that 
requires the vector space structure is adding noise from an independent source. On 
the level of probability measures this is represented by the convolution \i*v: 

0* *")(/) = Jl*{dx) Ju{dy) f(x + y) . (6) 

Basic properties of the metric on probability measures are summarized in the following 
Lemma. 

Lemma 1 Consider X = R™, with a metric d given by a vector space norm. 
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Then for any probability measures fi, v 
d(lA, fi * v) < Ju(dy) \y\. 

Let n = 1, and let Fi(t) = [lA (— oo, £]) 
probability measures on K. Then 



denote the distribution functions of two 



we have the inequality 



The first estimate follows by inserting for pi2 in © the joint distribution of x and x + y 
implied by the independence of x and y according to For the second statement note 
that we can take the supremum over / £ A over the subset of piecewise differentiable 
functions with \f'(x)\ < 1 such that /' has compact support, and write 



up to boundary terms which cancel in the difference — M2(/)- This also provides 

a formula for the maximizing /: we take f'(x) = ±1, depending on the sign of F\ (x) — 
F 2 (x). 

2.2 Distance of Observables 

Let us now consider observables over X, i.e. quantum devices, which produce an 
output x £ X in every single experiment. Let us take the quantum particles to be 
described in a Hilbert space TC, so that every preparation of quantum particles is 
described by a density operator p. For any such preparation p, the outputs of the 
device are then distributed with respect to a probability measure fi p on X. Since the 
map p i — ► Mp(/) i s affine in p and bounded, for every bounded measurable function /, 
there is an operator F(/) £ B(7i) such that 



Then F is a linear operator, taking positive functions to positive operators, and F(l) = 
1. Evaluated just on the indicator functions we get a positive operator values measure 
(POVM), from which the values for general / are recovered by integration: We have 
F(/) = fF(dx) f(x). Either the measure or the linear operator F will called an 
observable, and the two are denoted by the same letter. Of course, an important 
special case is that each value of the measure is a projection, which is equivalent to 
F(fg) = F(/)F(g). Such observables will be called projection valued (PVM). 

How should we define the distance of observables now? An approach based on joint 
distributions is not feasible, because very often positive operator valued measures 
do not admit an extension to a joint observable, so the measured outputs of two 
observables typically cannot be seen in the same experiment and compared for each 
single shot separately. We can, however, compare distributions. And since equality 
of observables Fi,F2 means, by definition, that the probability measures ^ti, p and 
[1,2. P coincide for every p, it is natural to say that two observables are similar if they 
give similar probability distributions for all states, in the sense of the metric defined 
previously. Hence we set, for any two observables Fi,F2 on the same metric output 
space (X, d) and for quantum systems with the same Hilbert space TC: 




(7) 



H P (f) = tr(p F(/)) , 



(8) 



d(Fi,F 2 ) 



supsup|tr (p(Fi(/)-F 2 (/)) 



P fSA 



sup||Fi(/)-F 2 (/)| 



(9) 



/eA 
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where the supremum over p in the first line is over all density operators, and in the 
second line we used that, for hermitian operators A we can express the operator 
norm as ||A|| = sup p \tr(p A)\. Thus "d(Fi,F2) < e" is synonymous with the bound 
jtr(pFi(/)) — tr(pF2(/))j < e, on differences of expectation values, valid for all states 
p, and all bounded functions / with Lipshitz slope at most 1. 

It is important to note that the dual characterization of the metric as Monge 
distance cannot be transferred from the case of scalar probability measures to the 
operator valued case. Of course, for any fixed p we get a joint probability distribution 
P-i2. P minimizing cost for the Monge problem of pi tP and pi, p . However, in contrast 
to the Pi, p , the function p t—> pi2, P is not affine in p, and hence there is no observable 
F12 such that pi2, P (f) = tr(pFi2(/)), and even if there happens to be some joint 
measurement F12 with marginals Fi, providing an affine family of joint distributions, 
this gives only a loose upper bound d(Fi,F2) < ||Fi 2 (d)||. 

2.3 Statement of the Theorem 

Let us now consider a quantum mechanical system with n canonical degrees of freedom, 
described in a Hilbert space 7i. That is there are self adjoint operators Pp, Q M , p — 
1, . . . , n satisfying the canonical commutation relations 

i[Pp,,QA = hdpul 

i[Pp,Pu] = i[Q„,Q„] = (10) 

on a dense set of vectors, on which all real linear combinations of these operators are 
essentially self-adjoint. Note that we do admit additional degrees of freedom unrelated 
to the PfiyQii under consideration. Under these conditions there are joint spectral 
measures for the Q^., i.e., there is a unique projection valued observable on X — 
such that Qp — J Q(dx)x fl . As a metric on X we take some metric derived from a 
norm |-| on position space, such as the Euclidean metric when we consider a single 
particle. By \Q\ we denote the operator 

101 = Q(H) = J Q(dx) \ x \ . (11) 

Similarly, we consider the momentum observable P, which is the joint spectral measure 
of the P p , and choose a suitable norm on momentum space. 

Theorem. Let Q, P be the position and momentum observables of a system with n 
degrees of freedom in a Hilbert space TL. Let M be an observable on R n x K" with 
marginals Mi and M2. Then 

d(Q,Mi)-d(P,M 2 ) > Ch. (12) 

The best constant C in this inequality is determined as Ch = Eq /Aab, where Eq is the 
lowest eigenvalue of the operator 

K = a\Q\+b\P\ (13) 

for some positive constants a,b > 0. Equality in holds for a suitable covariant 
observable. 
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Of course, the numerical value of the constant C does depend on the two metrics 
chosen, and on the number n of degrees of freedom. For a single degree of freedom, 
with |-| the usual absolute value we get 

C w 0.304745 (14) 

The unique covariant observable attaining this bound is determined (numerically) in 
Section r3.2l It is not equal to the covariant observable based on coherent states, which 
realizes the uncertainty product C' — 1/n ~ 0.3183. 

2.4 Weight at infinity 

Before going into the proof of the Theorem, we have to be a bit more precise about 
the class of functions /, for which we need expectations fi(f) and operators F(/) to 
be defined. This subsection is somewhat technical, and can be skipped by those who 
are only interested in the construction of joint measurements saturating the bound. 

The issues we discuss in this subsection do not require covariance, and make sense 
for a general locally compact metric space (X, d), in our case either phase space, 
position space or momentum space. Integration on locally compact spaces can be 
developed nicely as a theory of linear functionals ("Radon measures") on the space 
Cqo(X) of continuous functions of compact support. This approach is advocated e.g., 
by Bourbaki[jj2| (see also Dieudone|10| for the simpler special case of metrizable sepa- 
rable spaces, which is all we need here). 

However, we need expectation values not just for / G Coo(X) . For example, in order 
to define the normalization of probability measures we need to integrate the function 
"1". A large part of measure and integration theory is devoted to extending the 
definition of integrals to larger and larger classes of functions. For the normalization 
one defines 

fi(l) = sup{^(/) | / G Coo(X), /<!}• (15) 

The extension of fi(f) to all bounded measurable functions / follows similar limit 
processes. For our purposes, however, it is only necessary to evaluate expectations 
on / G A, so we can compute the distance of probability measures via Definition 
Therefore we will stay with the minimal space of functions necessary for that purpose, 
which is the C*-algebra generated by A, the algebra C nc (X) of bounded uniformly 
continuous functions on X. 

Now in Section 0] we will construct directly some normalized positive linear func- 
tionals (i.e., "states") /x on C UC (X), and we would like to conclude that such a fi defines 
a measure on X. The problem is, however, that since 1 G C U c(X), an equation like 
115H is now no longer a definition of the left hand side, but a property of the functional 
fi. And we will see that it may indeed fail to be true. In other words, for such func- 
tionals the monotone convergence theorem sup n n(f n ) = M ( su Pn /») f° r the pointwise 
supremum of functions on X may fail. 

How does this fit in with the equivalence between 'measures as set functions' and 
'measures as linear functionals' proclaimed at the beginning of this subsection and in 
Section |2p This can be understood by considering the example of the algebra A of 
bounded continuous functions on the open unit disk, which have a continuous extension 
to the closed disk. As an algebra this is identical with the continuous functions on 
the closed disk, but what is meant by 'pointwise supremum' now depends on what 
we consider as the domain of these functions. For example, a point measure on the 
boundary has the property of giving zero expectation to any function with compact 
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support inside the open disc, producing a failure Eq. 11511 . It is clear from this example 
that the equivalence between measures as set functions and as linear functionals may 
require that we suitably extend the underlying space, i.e., we consider measures on 
the closed disc rather than just the open disc. 

The general situation for a locally compact metric space X is quite similar to this 
example. Like every commutative C*-algebra, the algebra C U c(X) is isomorphic to 
the continuous functions C(X) on a compact space X, called the Gelfand spectrum 
of Cuc(X). X can be constructed as the set of pure states on the algebra C UC (A"). 
Since evaluation at a phase space point is a pure state, we have X C X, and X is a 
compactification of X. The additional points X \ X should be thought of as points at 
infinity, and clearly a measure may be supported on such points so that the restriction 
of /i to Coo{X) is zero, and Equation 1151 is violated as 1 7^ 0. The points at infinity 
have a very rich structure|ll|. but in this paper we are only interested in their collective 
weight with respect to a probability measure, which is simply the difference between 
left and right hand side of Equation 1151 : For the overall weight at infinity of a Radon 
probability measure /i on C uc (-X") we introduce the notation 

/i(oo) = 1 - sup{/Li(/) I / € Coo W, /<1}. (16) 

For a positive operator valued measure we can take exactly the same definition: 
the supremum exists in the weak operator topology, because the net of functions / is 
directed. Equivalently, we can apply the scalar definition to every measure Av(/) = 
tr(pM(/)), and define the operator weight at infinity by tr(pM(oo)) = fi p (00) for 
every p. The key observation, allowing us later to eliminate weights at infinity, is the 
following 

Lemma 2 

1. For any Radon probability measures (11,^12 on a locally compact metric space 
{X,d): d(fii,fi2) < 00 implies ^i(oo) = 7x2(00) . 

2. Let M be an observable on phase space, whose marginals have finite distance to 
the standard position and momentum observables, respectively. Then M(oo) 





Proof: As a a net of functions Jr £ Cqo(X) we choose 

, , (l-d(0,x)/R if d(0,x) <R . „ 

/fl(x) = \ if d(0,x)>R (17) 

where € X is an arbitrarily chosen reference point. Since a locally compact metric 
space is the union of the compact balls {x\ d(0,x) < R} this family eventually domi- 
nates every function / G Coo(X) with / < (1 — e). Hence 1 — /i(oo) = lim.R_> 00 /x(/i?) 
for every probability measure. Then, since R/r £ A, 

M/fl)-M2(/fl)|<!d(m,M2) (18) 

and the first result follows by taking the limit R — > 00. 
For the second statement consider the inequality 

i-Mp)M?) = (W*(p)) + /h(p)(i -/«(*)) 
< (i - /*(!>)) + (i - /*(«)) , 
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apply M and take the limit R — > oo to get M(oo) < Mi(oo) + M2(oo), where Mi are 
the two marginals. But since the standard position and momentum observables have 
zero weight at infinity, part 1 of the Lemma shows that Mi(oo) = 0. 

3 Covariant observables 

3.1 Phase space covariant observables 

It is not a priori clear that there exist approximate joint measurements of P and Q, 
making AQ — d(Q,Mi) and AP = d(P, M2) finite. But there is a simple, and even 
well-known construction for joint measurements of position and momentum achieving 
just that. These phase space observables have the additional property, that the unitary 
groups of translation (generated by the momentum operators) and boosts (generated 
by the position operators) act like a shift in phase space on the arguments of M. Let 
us introduce the Weyl operators (phase space translations) 

W(p,q)=exp^(q-P-p-Q) . (19) 

We will assume in this section that beyond the canonical ones under consideration 
there are no additional degrees of freedom, i.e., the Weyl operators act irreducibly 
on ri. Then by von Neumann's Uniqueness Theorem|12| we can take, up to unitary 
equivalence, TL = C 2 (R™ , dx) , Q M the multiplication by the coordinate x^ and P M = 
j g^- . The Weyl operators in this representation become 

(W(p,q)il>)(x)=exp±(-2^- P 'x} ip(x + q). (20) 

Then, denoting by (r x f)(y) = f(y — x) the translate of a function on a vector 
space by x, we get the shift covariance property of the standard position observable 
becomes 

Q(r,/) = W(p, qTQ(f)W(p, q) , (21) 

for all bounded measurable / and all p,q G R n . There is an analogous property of P, 
and we define a covariant phase space observable by the equation 

M{r M f) = W(p, qYM(f)W(p, q) . (22) 

It turns out that there is a closed formula for all such observables, described in the 
following Lemma. Recall from the previous section that C U c(X) denotes the algebra of 
bounded uniformly continuous functions on phase space X, and Cqo(X) the subalgebra 
of functions with compact support. The Lemma is well known, and versions of it can 
be found in .PHvUni [T51HlllT51 . 

Lemma 3 Let M be a covariant observable on phase space, i.e., a linear map M : 
Cuc(X) — * B(TL), taking positive functions to positive operators, and satisfying Eq. k2iX . 
Suppose that M has zero weight at infinity, i.e., sup{M(/)|/ £ Coo(X), f < 1} = I. 
Then there is a positive operator m with tr(m) = 1 such that 

M(/) = J /(P) q) W fa q y m w(p; q) . (23) 
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Conversely, this formula defines a covariant observable for every m. The integral is a 
weak integral, i.e., for every density operator p we have to compute the expectation as 

tr(p M(/)) = J /(P) q ) tr ( p W (p, <?)* m W{p, q)) . (24) 

Here the operator m is appropriately called a density operator for two separate 
reasons: on the one hand as positive operator of trace 1, and on the other hand as 
the "Radon-Nikodym derivative" of the observable at the origin. The fact that the 
trace under the integral is an integrable function, and, in fact a probability density 
on phase space follows from the fundamental "square integrability" property of Weyl 
operators. In the special case that m is a coherent state (ground state of an oscillator 
Hamiltonian) this probability density is also known as the Husimi functional] of p. 

It is now easy to compute the marginals of M. Of course, the marginals M 
will inherit a covariance property: For example, the position-like marginal Mi will 
have the same property I2H as the position observable. Since each Mi(/) commutes 
with momentum translations, these operators must be functions of position, and the 
covariance for position shifts forces Mi to be equal to Q up to some smearing by 
convolution with a fixed probability density. Explicitly, we get the required density 
from the form of the Weyl operators. The result is 

m«(/)=tr(n*mnQ(/)), (25) 

where n is the parity operator (T[ip)(x) — ip(—x). With the analogous expression for 
momentum and the definition JSJ of convolution we then have 

Mi = Q * m Q and M 2 = P * m P . (26) 

To summarize: the marginals of a covariant phase space observable can be sim- 
ulated in the following way: one simply makes the corresponding ideal position or 
momentum measurement, and adds some noise from a source which independent of 
the quantum state. The noise distributions nfi and m p are the position and mo- 
mentum distributions of a density operator (namely n* mil), hence there is the usual 
tradeoff: if we insist on a good position measurement, i.e., sharply peaked rrfe , then 
m p will be very spread out, and much noise is added to momentum, and conversely. 
In the following section we make this quantitative in the sense of the distance of 
observables. 

3.2 Optimizing over covariant observables 

The uncertainties d(Mi,Q) follow from Eq. 1261 and Lemma0 we get 

d(Mi,Q)= Jm Q {dx) \x\ =tr(m|Q|) , (27) 

and the analogous relation for P. Here the inequality "<" follows from Lemma^l, and 
equality follows from the formula d(8 y ,v) = f v(dx) d(x,y) and the observation that 
there are states p whose position distribution is arbitrarily close to a point measure. 
We now have to determine which combinations of two positive numbers 

(d(Mi,Q),d(M 2 ,P)) = (tr(m |Q|),tr(m |P|)) 
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can be obtained by varying the density operator m. Because the coordinates depend 
linearly on m, this is a convex set in the plane. Moreover, if one pair (Si, 82) is pos- 
sible, then so is (81,82) with 8[ > 8%, because we can replace m by a suitable average 
over translates, and can vary the distribution of translation vectors from sharply con- 
centrated to very broad. An important point is that we can also apply the dilation 
symmetry Q v t— > AQ„, P v 1— > \~ 1 P V . This is shown in Figure with every point 
the admissible region contains the entire hyperbola through that point. In order to 




Figure 3: Admissible region for pairs (AQ,AP) = (tr(p |Q|), tr(p |P|)). The 
tangent shown is the contour line of aAQ + bAP realizing the minimum expec- 
tation of K . 

find the parameter Ch for the boundary hyperbola, consider the lowest admissible 
expectation Eo for a linear combination 

K = a\Q\ +6|P| , (28) 

with a, b > 0. This is the same as the smallest aAQ + bAP with (AQ, AP) in 
the admissible region of Figure 13.21 Clearly this will be attained on the boundary 
hyperbola (AQ)(AP) = Ch, which gives E = 2VabCh. Solving for C we find the 
statement of the Theorem, for the special case of covariant M. 

We still have to clarify whether the bound is attained, i.e., whether there is really 
an eigenvalue Eo at the bottom of the spectrum of K. This is equivalent to the 
same question for the top of the spectrum of the operator (K + 1) , and we will 
show it by verifying that (K + I)™ 1 is a compact operator. This also shows that 
the relevant eigenvalue has finite multiplicity. A quick way to show compactness is 
by the correspondence theory of Refill lll| : we only need to show that (p, q) i-» 
W(p,q)* (K + T)~ 1 W(p,q) is continuous in norm (which is obvious by a resolvent 
equation), and that the function 

k(p,q) = (W(p,q)<i-,(K+l)- 1 W(p,q) ( b) (29) 

goes to zero at infinity, when $ is some fixed Gaussian wave function. But since the 
operator inverse is decreasing with respect to operator ordering, we have (K + 1)~ < 
(a\P\ + I) -1 , from which we get the estimate k(p,q) < const(a|p | + and similar 

estimate for q. Hence k goes to zero. 

In practice the computation of Eo is best done by using the symmetry of the prob- 
lem. For several degrees of freedom and Euclidean norms this is rotation symmetry, 
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for one degree of freedom, we still have reflection symmetry. In addition, it is useful 
to take a — b — h — 1, and use the Fourier transform symmetry of K. Since the 
i?o-eigenspace is finite dimensional, we can seek joint eigenvectors of K and the sym- 
metries. Then K is truncated to a subspace spanned by finitely many eigenfunctions 
of the harmonic oscillator with fixed symmetry, and the resulting matrix (which can 
be constructed symbolically, i.e., with infinite precision) is numerically diagonalized. 
One readily finds that the ground state of K is close to the oscillator ground state (see 
Fig. 13.21 . and is realized for angular momentum i — (resp. even parity), and vectors 
invariant under Fourier transform. For more than one degree of freedom we get the 
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Figure 4: The ground state wave function of the operator K (solid line), com- 
pared to the Gaussian oscillator ground state (dashed) for one degree of freedom. 

following table 



dimension 


C 


a 


1 


0.3047 


1/tt= 0.3183 


2 


0.7628 


7r/4= 0.7853 


3 


1.2457 


4/tt= 1.2732 


42 


20.710 


20.751 



4 Reduction to the covariant case 

In this section we will prove that in order to construct a joint measurement M with 
small uncertainties we can restrict attention to the covariant observables studied in 
the previous section. The basic idea is to average over phase space translations, thus 
turning a given observable M into a covariant one M av with at least as small error 
bounds. 

The basis for the construction is a so-called invariant mean [16) on the group 
of phase space translations: this associates to any bounded continuous function / on 
phase space a number n(u) such that u >—> n(u) is linear, positive on positive functions, 
normalized as 77(1) = 1, and invariant in the sense that v(t x u) = n(u). The existence 
of invariant means is by no means obvious. Any constructive procedure based on 
integrating over larger and larger sets, and dividing by the volume of the set, will be 
convergent only for 'well behaved' functions, such as almost periodic ones or functions 
going to zero at infinity. The latter always average to zero, i.e., as a set an invariant 
mean has weight at infinity 77(00) = 1 in the sense of Eq. 1161 . A functional r\ defined 
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on all bounded continuous functions can indeed not be constructed explicitly, and we 
know its existence only via the axiom of choice. 

Now let / be a uniformly continuous function on phase space, and p a density 
operator. Then we define the operator M av (/) by 

tr( P M av (/)) = vHpJ)) , (30) 
where u{p,f)(p,q) = tr (W(p, q)*p W(p, q)M{T (Piq) f)) 

The function u(p, f) is designed so that it is constant, if M is covariant, in which 
case case M av = M. For arbitrary M it is still always uniformly continuous, because 
for uniformly continuous functions / translation is norm continuous, and because 
Weyl operators are strongly continuous in (p,q), making (p, q) i— > W(p,q)p W(p,q)* 
continuous in trace norm. The function u is also bounded by the upper bound ||/|| 
for /. It follows that the invariant mean r\ is applicable, and that the right hand 
side EOJ is a bounded linear functional on the convex set of density operators, and, 
consequently, there is a unique bounded operator M av (/). Obviously, M av (/) is linear 
in /, positive on positive functions, and normalized (M av (l) = 1). By invariance of 
the mean it is also evident that it is a covariant observable. 

The crucial point we have to establish now is that averaging does not increase 
uncertainty. To this end, let us consider the set A4(<5i,<5 2 ) of observables M with 
d(Q,Mi) < <5i and d(P,M 2 ) < <5 2 . In other words, we take M(8\,82) as the set 
of observables M on phase space, such that for all density operators pi , p2 and all 
Lipshitz functions f\,gi ■ R n — > K with /, g 6 A: 

tr(piMi(/i))-tr(piQ(/i)) < Si 

tr(p 2 M 2 (/ 2 ))-tr(p 2 P(/ 2 )) < 8 2 . (31) 

Suppose an observable M satisfies these bounds. Then these relations remain true, 
if we replace p by W(p,q)p W(p, q)* , and the functions /, g by appropriately shifted 
ones. The terms involving P and Q are unchanged by this, but the terms with M 
become continuous functions of (p,q) of the kind u(p, /) in 1301 . to which we may 
apply the invariant mean rj. As a result we find that 

M g A4(Si,S 2 ) => M av G A1(5i,<5 2 ) . (32) 

Hence without increasing the uncertainty bounds we may replace M by the covariant 
observable M av . 

This reduces the problem of characterizing the admissible pairs (Si, 82) to Sectional 
except for two issues. The first is that throughout that section we had assumed the 
Weyl operators to act irreducibly, i.e., there were no further degrees of freedom present. 
However, von Neumann's uniqueness Theorem|12| asserts that any system of Weyl 
operators can be decomposed into a direct sum of irreducible systems. Let p a be the 
projections onto the irreducible direct summands, which by definition commute with 
all Weyl operators. Then we claim that M 6 Al(<5i,<5 2 ) implies that J] a p Q Mpa £ 
M(6i, 82). The argument for this is averaging as before, but over the group of unitaries 
of the form U = ^2 u a p a with \u a \ — 1. Clearly, such a direct sum of observables lies 
in Al((5i,5 2 ) iff every summand does. On the one hand, this means that additional 
degrees of freedom cannot increase the set of admissible (61,82), and on the other 
hand it means that if such a pair is admissible for irreducible Weyl systems, we can 
construct observables with this bound for arbitrary systems as direct sums. 
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The second issue we have to address is that M av comes out as a Radon observable, 
i.e., we get an operator M av (/) for every bounded uniformly continuous function /, 
but such an observable might have non-zero weight at infinity. In fact, it is typical for 
constructions based on a compactness argument (such as our appeal to the existence 
of invariant means) that one has to verify in the end that the construction does not 
lead to a wild element of a compactified space. For example, if we had omitted the 
Weyl operators from the definition of u, we could have still obtained some observable 
M av from Eq. I3U1 by averaging. But rather than getting a covariant observable we 
would have found the observable M av (/) = r/(f)l which has only weight at infinity. 
This is the reason we had to discuss weights at infinity in Section 12.41 In fact all 
the hard work was already done there: When 81 and 62 are finite, M av £ M{5\,52), 
implies that M av (oo) = by Lemma |2]2, and therefore, by Lemma |H] that we can 
construct M av by integration over a density operator, and compute and optimize the 
uncertainties as in Section f3. 21 This concludes the proof of the Theorem. 

5 Other Uncertainty Relations 

Of course, all uncertainty relations are related. Some variants that are closely related 
to the present paper will be briefly commented in this section. 

5.1 Measurement as Preparation 

One way of reducing measurement uncertainty to preparation uncertainty is the pro- 
jection postulate: According to this postulate the state of a system after measurement 
is an eigenstate of the measured observable for that eigenvalue, which happened to 
be the outcome of the measurement. Let us assume some approximate version of this 
postulate holds for the approximate measurement Q' for the microscope (Of course, 
this restricts the applicability of our argument). Then by conditioning on the partic- 
ular value of q' obtained from the Q'-measurement, we could understand the position 
measurement as a preparation of states with position and momentum spreads ~ AQ 
and AP. The relation between AQ and AP would then be just another special case 
of the Preparation Uncertainty Relation. The AP here would be the total variance 
of the momentum distribution after the measurement, i.e., not really that part of mo- 
mentum uncertainty introduced by the measurement itself. It could be much smaller 
that the initial momentum spread. So this reduction of measurement uncertainty to 
preparation uncertainty is straightforward only if we know that the initial state has 
sharp momentum. 

5.2 Variance of covariant observables 

The curious constant .3047 in the relation we prove is perhaps not so strange if one 
notes that the same constant appears in the preparation uncertainty, if we choose 
to quantify the spread AQ of position not by the square root of a second moment, 
but by an absolute first moment. In fact, this is the way the constant was derived 
in Section 13.21 So it is suggestive to look for an interpretation of AQ and AP for 
measurement uncertainty, which would also bring the constant to h/2. (From talks I 
gave about the subject I know some colleagues find fault with any other constant). This 
can be done at the expense of the Kantorovich interpretation of the Monge distance 
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as a worst case difference of expectation vaiues, by using a cost function for transport, 
which grows quadratically with the distance (also known as the Wasserstein 2-metric) . 

This is especially suggestive for the purely covariant case, in which the marginals 
of joint measurements are equivalent to adding noise from an external source: one can 
then simply take the second moment of the noise. This approach has been suggested 
also by Holevo. 

5.3 Ozawa's Approach 

In a series of recent papers.Ozawa[TTj M. Ozawa has also studied the measurement 
aspect of uncertainty. For this he considers measurements described as detailed cou- 
plings to an environment. Then one can explicitly point out a selfadjoint operator 
of the combined object-apparatus system which describes the momentum after the 
measurement. The 'perturbation of momentum' by the measurement is then repre- 
sented by the difference of the momentum operators before and after the measurement 
interaction, and quantified by the expectation of the square of this operator. 

This is definitely a departure from the operational approach to quantum mechan- 
ics, since this difference of non-commuting operators is not accessible in the given 
experiment. Of course, any operator represents an observable. But to find a device 
measuring just this operator is a highly non-trivial task. In contrast, in our approach 
only the statistics of measurements on the joint measuring device itself enters. 

Nevertheless, there are interesting aspects in Ozawa's approach. In particular, his 
analysis applies to every input state separately, whereas our figures of merit involve a 
supremum over all input states. Further relationships remain to be clarified. 
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